Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional knowledge distillation methods include response-based methods and feature-based methods. Response-based methods are used the most widely but suffer from lower upper limit of model performance, while feature-based methods have constraints on the vocabularies and tokenizers. In this paper, we propose a tokenizer-free method liberal feature-based distillation (LEAD). LEAD aligns the distribution between teacher model and student model, which is effective, extendable, portable and has no requirements on vocabularies, tokenizer, or model architecture. Extensive experiments show the effectiveness of LEAD on several widely-used benchmarks, including MS MARCO Passage, TREC Passage 19, TREC Passage 20, MS MARCO Document, TREC Document 19 and TREC Document 20.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
视频时间基础(VTG)的目标是根据自然语言(NL)描述在未修剪视频中定位时间矩。由于现实世界的应用程序提供了永无止境的视频流,因此它提出了对长形视频的时间基础的需求,这导致了两个主要挑战:(1)长视频长度使得很难处理整个视频而不减少样本速率并导致高计算负担; (2)随着候选时间的增加数量,准确的多模式对准更具挑战性。为了应对这些挑战,我们提出了一个有效的以窗户为中心的粗略对齐框架,它可以灵活地处理具有较高推理速度的长格式视频输入,并通过我们的新颖的Choce-Fine Muly-Fine增强了时间基础模态对齐框架。具体来说,我们通过滑动窗口方法将长视频将长视频切成候选窗口。 Cone(1)以窗户为中心,通过对比度学习和通过对NL查询相关的候选窗口进行过滤来学习窗口间的(粗粒)语义差异,并且(2)执行内部(罚款) - 使用强大的对比视力文本预训练模型的强大多模式对齐能力对候选力矩进行排名。长期视频的两个大规模VTG基准测试的广泛实验始终显示出可观的性能增长(MAD的3.13%至6.87%,从10.46%到EGO4D-NLQ上的10.46%至13.46%),并且Cone在两个数据集上都可以达到SOTA结果。分析揭示了组件的有效性和长期视频接地的效率较高,因为我们的系统在EGO4D-NLQ上提高了2倍的推理速度,而在MAD上提高了15倍的速度,同时保持了锥体的SOTA性能。
translated by 谷歌翻译
全向视频中的光流估计面临两个重要问题:缺乏基准数据集以及调整基于视频的方法以适应全向性质的挑战。本文提出了第一个具有360度视野Flow360的感知上天然合成的全向基准数据集,其中有40个不同的视频和4,000个视频帧。我们在数据集和现有的光流数据集之间进行了全面的特征分析和比较,这些数据集表现出感知现实主义,独特性和多样性。为了适应全向性质,我们提出了一个新颖的暹罗表示学习框架(SLOF)。我们以对比度的方式训练我们的网络,并结合了对比度损失和光流损失的混合损失函数。广泛的实验验证了所提出的框架的有效性,并在最新方法中显示出40%的性能提高。我们的Flow360数据集和代码可在https://siamlof.github.io/上找到。
translated by 谷歌翻译
脑血管图像分割可以用作有前途的生物标志物,以更好地预防和治疗不同的疾病。一种成功的方法是将细分视为图像对图像翻译任务,并执行条件生成对抗网络(CGAN),以学习两个分布之间的转换。在本文中,我们提出了一种新型的多视图方法,即MLP-GAN,该方法将3D体积脑容器图像分为三个不同的2D图像(即矢状,冠状,冠状,轴向),然后将其喂入三个不同的2D CGAN。拟议的MLP-GAN不仅减轻了原始3D神经网络中存在的记忆问题,而且还保留了3D空间信息。具体来说,我们利用U-NET作为发电机的骨干,重新设计与MLP混合器集成的Skip连接模式,该模式最近引起了很多关注。我们的模型获得了捕获交叉绘制信息的能力,可以与MLP混合使用者学习全局信息。在公共脑容器数据集上进行了广泛的实验,该数据集表明我们的MLP-GAN优于其他最先进的方法。我们在https://github.com/bxie9/mlp-gan上发布代码
translated by 谷歌翻译
依靠这样的前提是,二进制神经网络的性能可以在很大程度上恢复,而完全精确的权重向量与其相应的二进制向量之间的量化错误,网络二线化的现有作品经常采用模型鲁棒性的想法以达到上述目标。但是,鲁棒性仍然是一个不明智的概念,而没有扎实的理论支持。在这项工作中,我们介绍了Lipschitz的连续性,即定义明确的功能特性,是定义BNN模型鲁棒性的严格标准。然后,我们建议将Lipschitz连续性保留为正规化项,以提高模型的鲁棒性。特别是,虽然流行的Lipschitz涉及正则化方法由于其极端稀疏而经常在BNN中崩溃,但我们将保留矩阵设计以近似于目标重量矩阵的光谱规范,可以将其作为BNN的Lipschitz常数的近似值部署精确的L​​ipschitz恒定计算(NP-HARD)。我们的实验证明,我们的BNN特异性正则化方法可以有效地增强BNN的鲁棒性(在Imagenet-C上作证),从而在CIFAR和Imagenet上实现最新性能。
translated by 谷歌翻译
现实世界的视觉搜索系统涉及具有不同计算和存储资源的多个平台上的部署。部署适合最小符合平台的统一模型会导致精度有限。预计将部署具有不同能力的模型,以适应资源约束,这要求这些模型提取的功能必须在度量空间中对齐。实现特征比对的方法称为“兼容学习”。现有的研究主要集中在一对一兼容的范式上,该范式在多个模型之间学习兼容性受到限制。我们提出了一个具有自我兼容性(SFSC)的可切换表示学习框架。 SFSC通过一个训练过程生成一系列具有不同能力的兼容子模型。子模型的优化面对梯度冲突,我们从大小和方向的角度来减轻它。我们通过不确定性估计动态调整子模型的优先级,以适当地将子模型合作。此外,预计有相互矛盾的梯度以避免相互干扰。 SFSC在评估的数据集上实现了最先进的性能。
translated by 谷歌翻译
We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models. Upon automatic identifying logical reasoning phenomena in massive text corpus via detection heuristics, we train language models to predict the masked-out logical statements. Inspired by the facilitation effect of reflective thinking in human learning, we analogically simulate the learning-thinking process with an adversarial Generator-Verifier architecture to assist logic learning. LogiGAN implements a novel sequential GAN approach that (a) circumvents the non-differentiable challenge of the sequential GAN by leveraging the Generator as a sentence-level generative likelihood scorer with a learning objective of reaching scoring consensus with the Verifier; (b) is computationally feasible for large-scale pre-training with arbitrary target length. Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets requiring general reasoning abilities, revealing the fundamental role of logic in broad reasoning, as well as the effectiveness of LogiGAN. Ablation studies on LogiGAN components reveal the relative orthogonality between linguistic and logic abilities and suggest that reflective thinking's facilitation effect might also generalize to machine learning.
translated by 谷歌翻译
拍卖设计中的主要问题之一是开发一种兼容激励兼容的机制,可最大程度地提高拍卖师的预期收入。尽管理论方法在多项目拍卖中遇到了瓶颈,但最近在通过深度学习找到最佳机制方面取得了很多进展。但是,这些作品要么着重于固定的竞标者和项目,要么将拍卖限制为对称。在这项工作中,我们通过将投标人和项目的上下文信息考虑到拍卖学习框架中来克服此类限制。我们提出了$ \ mathtt {Citransnet} $,这是一种基于上下文集成变压器的神经网络,用于最佳拍卖设计,该网络在竞标和上下文上保持了置换率 - 等值,同时能够找到不对称的解决方案。我们通过广泛的实验表明,$ \ mathtt {citransnet} $可以在单项设置中恢复已知的最佳解决方案,在多项目拍卖中优于强大的基线,并且可以很好地推广到培训中的案例以外的其他案例。
translated by 谷歌翻译